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Abstract 

We analyze three sets of income data: the US Panel Study of Income Dynam- 
ics (PSID), the British Household Panel Survey (BHPS), and the German Socio- 
Economic Panel (GSOEP). It is shown that the empirical income distribution is con- 
sistent with a two-parameter lognormal function for the low-middle income group 
(97%-99% of the population), and with a Pareto or power law function for the high 
income group (l%-3% of the population). This mixture of two qualitatively differ- 
ent analytical distributions seems stable over the years covered by our data sets, 
although their parameters significantly change in time. It is also found that the 
probability density of income growth rates almost has the form of an exponential 
function. 
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1 Introduction 



More than a century ago, the economist Vilfredo Pareto stated in his Cours 
d'Economie Politique that there is a simple law which governs the distribution 
of income in all countries and at all times. Briefly, if N represents all the 
number of income-receiving units cumulated from the top above a certain 
income limit x, and A and a are constants, then: 



and, therefore, log (N) = log (A) — ctlog (x). In other words, if the logarithms 
of the number of persons in receipt of incomes above definite amounts are 
plotted against the logarithms of the amount of these incomes, the points so 
obtained will be on a straight line whose slope with the axis on which the 
values of log (x) are given will be a. Pareto examined the statistics of incomes 
in some countries and concluded that the inclination of the line with the log (x) 
axis differed but little from 1.5. 

Very recently, considerable investigations with modern data in capitalist 
economies have revealed that the upper tail of the income distribution (gen- 
erally less than 5% of the individuals) indeed follows the above mentioned 
behaviour, and the variation of the slopes both from time to time and from 
country to country is large enough not to be negligible. Hence, characteriza- 
tion and understanding of income distribution is still an open problem. The 
interesting problem that remains to be answered is the functional form more 
adequate for the majority of population not belonging to the power law part 
of the income distribution. Using data coming from several parts of the world, 
a number of recent studies debate whether the low-middle income range of the 
income distribution may be fitted by an exponential [1-8] or lognormal [9-13] 
decreasing function. 1 

In this paper we have analyzed three data sets relating to a pool of ma- 
jor industrialized countries for several years in order to add some empirical 
investigations to the ongoing debate on income distribution. When fits are 
performed, a two-parameter lognormal distribution is used for the low-middle 



1 Recently, a distribution proposed by [14,15] has the form of a deformed exponen- 
tial function: 

P K (x) = {^\J 1 + k 2 x 2 — kx) K 

which seems to capture well the behaviour of the income distribution at the low- 
middle range as well as the power law tail. 
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part of the distribution (97%-99% of the population), while the upper high- 
end tail (l%-3% of the population) is found to be consistent with a power law 
type distribution. Our results show that the parameters of income distribution 
change in time; furthermore, we find that the probability density of income 
growth rates almost scales as an exponential function. 

The structure of the paper is as follows. Section 2 describes the data used in 
our study. Section 3 presents and analyzes the shape of the income distribution 
(Section 3.1) and its time development over the years covered by our data sets 
(Section 3.2). Section 4 concludes the paper. 



2 The Data 



We have used income data from the US Panel Study of Income Dynamics 
(PSID), the British Household Panel Survey (BHPS), and the German Socio- 
Economic Panel (GSOEP) as released in a cross-nationally comparable format 
in the Cross-National Equivalent File (CNEF). The CNEF brings together 
multiple waves of longitudinal data from the surveys above, and therefore 
provides relatively long panels of information. The current release of the CNEF 
includes data from 1980 to 2001 for the PSID, from 1991 to 2001 for the BHPS, 
and from 1984 to 2002 for the GSOEP. Our data refer to the period 1980-2001 
for the United States, and to the period 1991-2001 for the United Kingdom. 
As the eastern states of Germany were reunited with the western states of the 
Federal Republic of Germany in November 1990, the sample of families in the 
East Germany was merged with the existing data only at the beginning of the 
1990s. Therefore, in order to perform analyses that represent the population 
of reunited Germany, we chose to refer to the subperiod 1990-2002 for the 
GSOEP. 

A key advantage of the CNEF is that it provides reliable estimates of annual 
income variables defined in a similar manner for all the countries that are 
not directly available in the original data sets. 2 It includes pre- and post- 
government household income, estimates of annual labour income, assets, pri- 
vate and public transfers, and taxes paid at household level. In this paper, the 
household post-government income variable (equal to the sum of total family 
income from labour earnings, asset flows, private transfers, private pensions, 
public transfers, and social security pensions minus total household taxes) 
serves as the basis for all income calculations. Following a generally accepted 
methodology, the concept of equivalent income will serve as a substitute for 
personal income, which is unobservable. Equivalent income x is calculated as 

2 Reference [16] offers a detailed description of the CNEF. See also the CNEF web 
site for details: pittp://www.human.cornell.edu/pam /gsoep/equivfil.cfm 
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follows. In a first step, household income h is adjusted for by household type 9 
using an equivalence scale e (9). 3 This adjusted household income x = h/e (9) 
is then attributed to every member of the given household, which implies that 
income is distributed equally within households. 

In the most recent release, the average sample size varies from about 7,300 
households containing approximately 20,200 respondent individuals for the 
PSID-CNEF to 6,500 households with approximately 16,000 respondent indi- 
viduals for the BHPS-CNEF; for the GSOEP-CNEF data from 1990 to 2002, 
we have about 7,800 households containing approximately 20,400 respondent 
individuals. 

All the variables are in current year currency; therefore, we use the consumer 
price indices to convert into constant figures for all the countries. The base 
year is 1995. 



3 Empirical Findings 

3. 1 The Shape of the Distribution 

The main panel of the pictures illustrated in Fig. 1 presents the empirical 
cumulative distribution of the equivalent income from our data sets for some 
randomly selected years in the log-log scale. 4 

[Fig. 1 about here.] 

As shown in the lower insets, the upper income tail (about l%-3% of the 
population) follows the Pareto's law: 

I — F (x) — P (X > x) — C a x~ a (2) 

where C a = k a , k, a > 0, and k < x < oo. Since the values of x above some 
value xr can not be observed due to tail truncation, to fit the (logarithm of 
the) data for the majority of the population (until the 97 th -99 th percentiles of 

3 We use the so-called "modified OECD" equivalence scale, which is defined for 
each household as equal to 1 + 0.5 x (^adults — 1) + 0.3 x (^children). 

4 To treat each wave of the surveys at hand as a cross-section, and to obtain 
population-based statistics, all calculations used sample weights which compensate 
for unequal probabilities of selection and sample attrition. Furthermore, to elimi- 
nate the influence of outliers, the data were trimmed. We also dropped observations 
with zero and negative incomes from all samples. 
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the income distribution) we use a right-truncated normal probability density 
function: 



/ (y) 



VR 

J f(y)dy 



-oo <y <y R 
yn < y < oo 



(3) 



where y = log (x), and yn = log (xr). The fit to Equation (3) is shown by the 
top insets of the pictures. 

To select a suitable threshold or cutoff value xr separating the lognormal part 
from the Pareto power law tail of the empirical income distribution, we use 
visually oriented statistical techniques such as the quantile-quantile (Q-Q) and 
mean excess plots. Figure 2 gives an example of these graphical tools for the 
countries at hand. 



[Fig. 2 about here.] 

The left pictures in the figure are the plots of the quantile function for the 
standard exponential distribution (i.e., a distribution with a medium-sized 
tail) against its empirical counterpart. If the sample comes from the hypoth- 
esized distribution, or a linear transformation of it, the Q-Q plot is linear. 
The concave presence in the plots is an indication of a fat-tailed distribution. 
Since a log-transformed Pareto random variable is exponentially distributed, 
we conduct experimental analysis on the log-transformed data by excluding 
some of the lower sample points to investigate the concave departure region 
on the plots and obtain a fit closer to the straight line. The results are shown 
by the insets of the left pictures in the figure. The right pictures plot the em- 
pirical average of the data that are larger than or equal to xr, E (X\X > xr), 
against xr. If the plot is a linear curve, then it may be either a power type 
or an exponential type distribution. If the slope of the linear curve is greater 
than zero, then it suggests a power type (as in the main panels) ; otherwise, if 
the slope is equal to zero, it suggests an exponential type (as in the insets for 
the log-transformed data). 



3.2 Temporal Change of the Distribution 



The two-part structure of the empirical income distribution seems to hold all 
over the time span covered by our data sets. The distribution for all the years 
and countries are shown in Fig. 3. 

[Fig. 3 about here.] 
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As one can easily recognize, the distribution shifts over the years covered by 
our data sets. It is conceivable to assume that the origin of this shift con- 
sists in the growth of the countries. To confirm this assumption, we study 
the fluctuations in the output and equivalent income growth rate, and try to 
show that the evolution of both these quantities is governed by similar mech- 
anisms, pointing in this way to the existence of a correlation between them as 
one would expect. We calculate the growth rates using the monthly series of 
the Index of Industrial Production (IIP) from [17] for output and connecting 
individual respondents' incomes over time for the equivalent income, 5 and 
express them in terms of their logarithm. 6 To account for the fact that the 
variance of the growth rates varies, we scale each growth rate by dividing by 
the corresponding estimated standard deviation. In Fig. 4 we graph the em- 
pirical probability density function for these scaled growth rates, where the 
data points for the equivalent income in the main panels are the average over 
the entire period covered by the CNEF surveys. 



As one can easily recognize, after scaling the resulting empirical probability 
density functions appear identical for observations drawn from different popu- 
lations. Remarkably, both curves display a simple "tent-shaped" form; hence, 
the probability density functions are consistent with an exponential decay [18]: 



where — oo < r < oo, — oo < f < oo, and a > 0. We test the hypothesis that 
the two growth rate distributions have the same continuous distribution by 
using the two-sample Kolmogorov-Smirnov (K-S) test; the results shown in 
Table 1 mean that the test is not significant at the 5% level. 



These findings are in quantitative agreement with results reported on the 
growth of firms and countries [19-26], leading us to the conclusion that the 
data are consistent with the assumption that a common empirical law might 
describe the growth dynamics of both countries and individuals. 

Even if the functional form of the income distribution expressed as lognor- 
mal with power law tail seems stable, its parameters fluctuate within narrow 
bounds over the years for the same country. For example, the power law slope 

5 To properly weight the sample of individuals represented in all the years of the 
CNEF surveys, we use the individual's longitudinal sample weights. 

6 All the data have been adjusted to 1995 prices and detrended by the average 
growth rate, so values for different years are comparable. 



[Fig. 4 about here.] 




(4) 



[Table 1 about here.] 
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has a value a = [1.1, 3.34] for the US between 1980 and 2001, while the cur- 
vature of the lognormal fit, as measured by the Gibrat index (3 = 1/ (a\^2), 
ranges between approximately (3 = 1 and (3 = 1.65; for the UK between 1991 
and 2001, a = [3.47, 5.76] and (3 = [2.18, 2.73]; for Germany between 1990 and 
2002, a = [2.42,3.96] and (3 = [1.63,2.14]. The time pattern of these param- 
eters is shown by the main panels of Fig. 5, which also reports in one of the 
insets the temporal change of inequality as measured by the Gini coefficient. 



[Fig. 5 about here.] 



As one can easily recognize, the information about inequality provided by the 
Gibrat index seems near enough to those provided by the Gini coefficient, 
which is a further confirmation of the fact that the lognormal law is a good 
model for the low-middle incomes of the distribution. The Pareto index is a 
rather strongly changing index. Among others, the definition of income we 
use in the context of our analysis contains asset flows. It is conceivable to 
assume that for the top 1% to 3% of the population returns on capital gains 
rather than labour earnings account for the majority share of the total income. 
This suggests that the stock market fluctuations might be an important fac- 
tor behind the trend of income inequality among the richest, and that capital 
income plays an important role in determining the Pareto functional form of 
the observed empirical income distribution at the high income range [27]. The 
other insets of the pictures also show the time evolution of various parameters 
characterizing income distribution, such as the income separating the lognor- 
mal and Pareto regimes (selected as explained in Section 3.1), the fraction 
of population in the upper tail of the distribution, and the share of total in- 
come which this fraction accounts for. 7 One can observe that the fraction of 
population and the share of income in the Pareto tail move together in the 
opposite direction with respect to the cutoff value separating the body of the 
distribution from its tail, and the latter seems to track the temporal evolution 
of the Pareto index. This fact means that a decrease (increase) of the power 
law slope and the accompanying decrease (increase) of the threshold value xr 
imply a greater (smaller) fraction of the population in the tail and a greater 
(smaller) share of the total income which this population account for, as well 
as a greater (smaller) level of inequality among high income population. 



7 The share of total income in the tail of the distribution is calculated as fi a / '{i, 
where fi a is the average income of the population in the Pareto tail and ^ is the 
average income of the whole population. 
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4 Summary and Conclusions 

Our analysis of the data for the US, the UK, and Germany shows that there 
are two regimes in the income distribution. For the low-middle class up to 
approximately 97%-99% of the total population the incomes are well described 
by a two-parameter lognormal distribution, while the incomes of the top 1%- 
3% are described by a power law (Pareto) distribution. 

This structure has been observed in our analysis for different years. However, 
the distribution shows a rightward shift in time. Therefore, we analyze the 
output and individual income growth rate distribution from which we observe 
that, after scaling, the resulting empirical probability density functions appear 
similar for observations coming from different populations. This effect, which 
is statistically tested by means of a two-sample Kolmogorov-Smirnov test, 
raises the intriguing possibility that a common mechanism might characterize 
the growth dynamics of both output and individual income, pointing in this 
way to the existence of a correlation between these quantities. Furthermore, 
from the analysis of the temporal change of the parameters specifying the 
distribution, we find that these quantities do not necessarily correlate to each 
other. This means that different mechanisms are working in the distribution of 
the low-middle income range and that of the high income range. Since earnings 
from financial or other assets play an important role in the high income section 
of the distribution, one possible origin of this behaviour might be the change 
of the asset price, which mainly affects the level of inequality at the very top 
of the income distribution and is likely to be responsible for the power law 
nature of high incomes. 
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(a) United States (1996) 




(b) United Kingdom (2001) 




(c) Germany (1991) 



Fig. 1. The cumulative probability distribution of the equivalent income in the 
log-log scale along with the lognormal (top insets) and Pareto (lower insets) fits for 
some randomly selected years 
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(a) United States (1996) 



(b) United States (1996) 
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(c) United Kingdom (2001) 



(d) United Kingdom (2001) 
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(e) Germany (1991) 



(f) Germany (1991) 



Fig. 2. Q-Q plots (left pictures) against standard exponential quantiles and mean 
excess plots (right pictures) against threshold values for some randomly selected 
years. A concave departure from the straight line in the Q-Q plot (as in the left 
main panels) or an upward sloping mean excess function (as in the right main 
panels) indicate a heavy tail in the sample distribution. The insets in the pictures 
apply the same graphical tools to the log-transformed data 



Empirical distribution of the CNEF-PS1D data on logarithmic scaled axes ( 1980-2001 ) 




l<? 10" io' io a id' in 1 " 



Income (1 995 Y USD) 

(a) United States (1980-2001) 



Empirical distribution of the CNEF-BHi'S data on loe.arillmiic .scaled axes ( 1991-2001) 
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(b) United Kingdom (1991-2001) 



Empirical distribution of the CNEF-GSOEP data on logarithmic scaled axes ( 1990-2002) 
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(c) Germany (1990-2002) 
3. Time development of the income distribution for all the countries and 
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(a) United States (1980-2001) 




(b) United Kingdom (1991-2001) 




(c) Germany (1990-2002) 



Fig. 4. The probability distribution of equivalent income (main panels) and IIP 
(insets) growth rate for all the countries and years 
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PareloandCihnil itulex iCNLi-tiSUEP l')'J()-2()()2i 




1992 1994 1996 I99S 2000 



(c) Germany (1990-2002) 

Fig. 5. Temporal evolution of various parameters characterizing the income distri- 
bution 
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Tables 



Table 1 

Two-sample Kolmogorov-Smirnov test statistics and p-values for both output and 
equivalent income growth rate data for all the countries 



Country 


K-S test statistic 


p- value 


United States 


0.0761 


0.1133 


United Kingdom 


0.0646 


0.6464 


Germany 


0.0865 


0.2050 
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